What is Causal Inference?

A Fancy Name for an Old Way of Thinking

The Study of Cause & Effect

  • Causal inference is the process of estimating the mechanisms by which changes in variables (causes) lead to changes in an outcome (the effect).
  • It seeks to establish a cause and effect that is generalisable and substantively meaningful in the real world.
  • It’s more than just Randomised Controlled Trials!

Correlation != Causation



\[E[Y|T=1] - E[Y|T=0] = \underbrace{E[Y_1 - Y_0|T=1]}_{ATT} + \underbrace{\{ E[Y_0|T=1] - E[Y_0|T=0] \}}_{BIAS}\]

Facure Alves (2022)

Moving Beyond “Associations”

  • We are often interested in questions of “cause”, but we use methods (and language) more fit for description.
  • It is very common to see the term “association” used, to avoid claiming the outcome has been “caused” by our predictors (I am guilty of this).
    • But association still implies some causal effect (Haber et al. 2022).
    • And the recommendations and implications of the research clearly indicate a belief that the findings are causal!

Schrödinger’s Causal Inference


Our results suggest that “Schrödinger’s causal inference,” — where studies avoid stating (or even explicitly deny) an interest in estimating causal effects yet are otherwise embedded with causal intent, inference, implications, and recommendations — is common.

Haber et al. (2022)

More Than a Name

  • While “causal inference” has become the popular terminology, these ideas are not entirely new.
  • However, the approach (DAGs, explicit processes for drawing causal conclusions) has made the old methods more precise.
  • Much of causal inference focuses on methods for making causal claims possible when dealing with observational data.

Visualising Causal Models Using DAGs


Gittens, Yener, and Yung (2022)

Classifying Data Science Tasks

Understanding the Type of Questions We’re Asking & the Different Methods for Answering Them

The Challenge

  • It is sometimes difficult to tell whether you are dealing with description, prediction, or explanation.
  • This is because methods like regression are appropriate for all three and the way we talk about each often blurs the lines (Barrett, McGowan, and Gerke 2024).

Description - What Happened?

  • “Description is using data to provide a quantitative summary of certain features of the world.” (Hernán, Hsu, and Healy 2019)
  • Description is concerned with understanding what happened.
  • Counting events, estimating central tendencies (mean, median, mode) and variance, and visualising distributions.
  • Often used in combination with prediction and causal inference.

Counting Things is Good, Actually

  • Description is frequently treated as a lesser task, but I think this is fundamentally wrong.
  • Counting is one of the best things we can do with data (Barrett, McGowan, and Gerke 2024)!
  • It can be very valuable when the quantity is extremely important.
  • It can also be sufficient if we already know a lot about it from previous causal work.

Don’t Mistake Description for Explanation!

  • It is very easy to fall into the trap of treating description as causal, or using causal terms when discussing descriptive findings.

Prediction - What Will Happen?

  • “Prediction is using data to map some features of the world (the inputs) to other features of the world (the outputs).” (Hernán, Hsu, and Healy 2019)
  • Prediction focuses on identifying what will happen in the future.
    • How many A&E attendances will occur in the next hour, day, week, or month?
    • What is the probability of an individual suffering a health episode?
  • Machine learning is very good at answering these kinds of questions.

Causal Inference - What Was the Cause?

  • Causal inference tries to identify and quantify cause and effect.
    • What are the primary causal factors that define how long someone will wait in A&E?
    • What are the environmental factors that lead to someone developing cancer?
    • Do virtual wards reduce the quantity of hospital attendances an individual experiences?

The Differences Between Predictive & Causal Models

  • Causal models are interested in unbiased estimates, prediction focuses on reducing error.
  • Predictive models can (and should) include plenty of correlations
  • With causal models, fewer variables and focus should be on causal mechanism
  • A causal model can identify a small but meaningful effect that would not predict the outcome on its own and this would still be useful!

Conclusion

  • Understanding the type of question you’re asking will define the methods you will use to find the answer.
  • No approach is inherently more valuable, but they serve specific purposes!
  • Treating description as demonstrating causality is problematic.
  • Treating prediction and explanation as equivalent will lead to building models that are not fit for the task.

Further Resources

Thank You!

Contact:

Code & Slides:

References

Barrett, Malcolm, Lucy D’Agostino McGowan, and Travis Gerke. 2024. Causal Inference in r. https://www.r-causal.org/.
Facure Alves, Matheus. 2022. Causal Inference for the Brave and True. https://matheusfacure.github.io/python-causality-handbook/landing-page.html.
Gittens, Alex, Bülent Yener, and Moti Yung. 2022. “An Adversarial Perspective on Accuracy, Robustness, Fairness, and Privacy: Multilateral-Tradeoffs in Trustworthy ML.” IEEE Access 10: 120850–65.
Haber, Noah A, Sarah E Wieten, Julia M Rohrer, Onyebuchi A Arah, Peter WG Tennant, Elizabeth A Stuart, Eleanor J Murray, et al. 2022. “Causal and Associational Language in Observational Health Research: A Systematic Evaluation.” American Journal of Epidemiology 191 (12): 2084–97. https://academic.oup.com/aje/article/191/12/2084/6655746.
Hernán, Miguel A, John Hsu, and Brian Healy. 2019. “A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks.” Chance 32 (1): 42–49.